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Abstract 



In this paper, we aim at minimizing tiie energy consumption when executing a divisible workload under a bound 
on the total execution time, while resilience is provided through checkpointing. We discuss several variants of this 
multi-criteria problem. Given the workload, we need to decide how many chunks to use, what are the sizes of these 
chunks, and at which speed each chunk is executed. Furthermore, since a failure may occur during the execution of 
a chunk, we also need to decide at which speed a chunk should be re-executed in the event of a failure. The goal is 
to minimize the expectation of the total energy consumption, while enforcing a deadline on the execution time, that 
should be met either in expectation (soft deadline), or in the worst case (hard deadline). For each problem instance, 
we propose either an exact solution, or a function that can be optimized numerically. The different models are then 
compared through an extensive set of experiments. 

1 Introduction 

Divisible load scheduling has been extensively studied in the past years [SjilOJ. For divisible applications, the com- 
putational workload can be divided into an arbitrary number of chunks, whose sizes can be freely chosen by the user. 
Such applications occur for instance in the processing of very large data files, e.g., signal processing, linear algebra 
computation, or DNA sequencing. Traditionally, the goal is to minimize the makespan of the application, i.e., the total 
execution time. 

Nowadays, high performance computing is facing a major challenge with the increasing frequency of failures |9|. 
There is a need to use fault tolerance or resilience mechanisms to ensure the efficient progress and correct termination 
of the applications in the presence of failures. A well-established method to deal with failures is checkpointing: a 
checkpoint is taken at the end of the execution of each chunk. During the checkpoint, we check for the accuracy of 
the result; if the result is not correct, due to a transient failure (such as a memory error or software error), the chunk is 
re-executed. This model with transient failures is one of the most used in the literature, see for instance ifTSl lSl. 

Furthermore, energy-awareness is now recognized as a first-class constraint in the design of new scheduling al- 
gorithms. To help reduce energy dissipation, cuiTent processors from AMD, Intel and Transmetta allow the speed to 
be set dynamically, using a dynamic voltage and frequency scaling technique (DVFS). Indeed, a processor running 
at speed s dissipates watts per unit of time |4|. We therefore focus on two objective functions: execution time 
and energy consumption, while resilience is ensured through checkpointing. More precisely, we aim at minimizing 
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energy consumption, including that of checkpointing and re-execution in case of failure, while enforcing a bound on 
execution time. 

Given a workload W, we need to decide how many chunks to use, and of which sizes. Using more chunks leads to 
a higher checkpoint cost, but smaller chunks imply less computation loss (and less re-execution) when a failure occurs. 
We assume that a chunk can fail only once, i.e., we re-execute each chunk at most once. Indeed, the probability that 
a fault would strike during both the first execution and the re-execution is negligible. We discuss the accuracy of this 
assumption in Section]?] 

Due to the probabilistic nature of failure hits, it is natural to study the expectation E{E) of the energy consumption, 
because it represents the average cost over many executions. As for the bound D on execution time (the deadline), 
there are two relevant scenarios: either we enforce that this bound is a soft deadline to be met in expectation, or we 
enforce that this bound is a hard deadline to be met in the worst case. The former scenario corresponds to flexible 
environment where task deadlines can be viewed as average response times |6 1, while the latter scenario corresponds to 
real-time environments where task deadlines are always strictly enforced 1141 . In both scenarios, we have to determine 
the number of chunks, their sizes, and the speed at which to execute (and possibly re-execute) every chunk. 

Our first contribution is to formalize this important multi-objective problem. The general problem consists of 
finding n, the number of chunks, as well as the speeds for the execution and the re-execution of each chunk, both for 
soft and hard deadlines. We identify and discuss two important sub-cases that help tackling the most general problem 
instance: (i) a single chunk (the task is atomic); and (ii) re-execution speed is always identical to the first execution 
speed. The second contribution is a comprehensive study of all problem instances; for each instance, we propose either 
an exact solution, or a function that can be optimized numerically. We also analytically prove the accuracy of our 
model that enforces a single re-execution per chunk. We then compare the different models through an extensive set 
of experiments. We compare the optimal energy consumption under various models with a set of different parameters. 
It turns out that when A is small, it is sufficient to restrict the study to a single chunk, while when A increases, it is 
better to use multiple chunks and different re-execution speeds. 

The rest of the paper is organized as follows. First we discuss related work in Section |2] The model and the 
optimization problems are formalized in Section ]3] We discuss the accuracy of the model in Section]?] We first focus 
in Section|5]on the simpler case of an atomic task, i.e., with a single chunk. The general problem with multiple chunks, 
where we need to decide for the number of chunks and their sizes, is discussed in Section ]6] In Section |7] we report 
several experiments to assess the differences between the models, and the relative gain due to chunking or to using 
different speeds for execution and re-execution. Finally, we provide some concluding remarks and future research 
directions in Section|8] 

2 Related work 

Dynamic power management through voltage/frequency scaling ifTSl utilizes the slack in a given computation to 
reduce energy consumption while checkpointing. The authors of f7"TT1 utilize that slack to improve the reliability of 
the computation. Hence, it is natural to explore the interplay of power management and fault tolerance lfT2l . when both 
techniques result in delaying the completion time of tasks, thus resulting in a tradeoff between power consumption, 
reliability and performance. This tri-criteria optimization problem has been explored by many researchers, especially 
in real-time and embedded systems where the completion time of a task is as important as the reliability of its result. 

The power/reliability/performance tradeoff has been explored from many different angles. In llT6l . an adaptive 
scheme is presented to place checkpoints based on the expected frequency of faults and is combined with dynamic 
speed scaling depending on the actual occurrence of faults. Similarly, in 1 12 1, the placement of checkpoints is chosen 
in a way that minimizes the total energy consumption assuming that the slack reserved for rollback recovery is used 
for speed scaling if faults do not occur. In | ISJ, the effect of frequency scaling on the fault rate was considered and 
incorporated into the optimization problem. In ifTTl . the study of the tri-criteria optimization was extended to the 
case of multiple tasks executing on the same processor. In ifTSll . a constraint logic programming-based approach is 
presented to decide for the voltage levels, the start times of processes and the transmission times of messages, in such 
a way that transient faults are tolerated, timing constraints are satisfied and energy is minimized. 

Recently, off-line scheduling heuristics that consider the three criteria were presented for systems where active 
replication, rather than fault recovery, is used to enhance reliability [1]. Selective re-execution of some tasks were 
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considered in O to achieve a given level of reliability while minimizing energy, when tasks graphs are scheduled on 
multiprocessors with hard deadlines. Approximation algorithms for particular types of task graphs were presented to 
efficiently solve the same problem in 12]. 

In this work, we consider two types of deadlines that are commonly used for real-time tasks; hard and soft dead- 
lines. In hard real-time systems |14|, deadlines should be strictly met and any computation that does not meet its 
deadline is not useful to the system. These systems are built to cope with worst-case scenarios, especially in critical 
applications where catastrophic consequences may result from missing deadlines. Soft real-time systems |6| are more 
flexible and are designed to adapt to system changes that may prevent the meeting of the deadline. They are suited to 
novel applications such as multimedia and interactive systems. In these systems, it is desired to reduce the expected 
completion time rather than to meet hard deadlines. 



3 Framework 



Given a workload W, the problem is to divide W into a number of chunks and to decide at which speed each chunk 
is executed. In case of a transient failure during the execution of one chunk, this chunk is re-executed, possibly at a 



different speed. We formalize the model in Section 3.1 and then different variants of the optimization problem are 



defined in Section [372l Table [T] summarizes the main notations. 



W 

s 

a 

Tc 
Ec 



total amount of work 

processor speed for first execution 

processor speed for re-execution 

checkpointing time 

energy spent for checkpointing 



Table 1: List of main notations. 



3.1 Model 

Consider first the case of a single chunk (or atomic task) of size W, denoted as SingleChunk. We execute this 
chunk on a processor that can run at several speeds. We assume continuous speeds, i.e., the speed of execution can 
take an arbitrary positive real value. The execution is subject to failure, and resilience is provided through the use of 
checkpointing. The overhead induced by checkpointing is twofold: execution time Tc, and energy consumption Ec- 
We assume that failures strike with uniform distribution, hence the probability that a failure occurs during an 
execution is linearly proportional to the length of this execution. Consider the first execution of a task of size W 
executed at speed s: the execution time is Tcxcc = W/s + Tc, hence the failure probability is Pfaii = ATcxcc = 
X{W/ s + Tc), where A is the instantaneous failure rate. If there is indeed a failure, we re-execute the task at speed 
a (which may or may not differ from s); the re-execution time is then Tieoxoc = W/a + Tc so that the expected 
execution time is 

'^{T) Tqxcc ^" -f^fail^rccxcc 

= {W/s + Tc) + X(W/s + Tc){W/<j + Tc) . (1) 
Similarly, the worst-case execution time is 

Twc Tqxcc ^ -^rccxcc 

^{W/s + Tc) + {W/a + Tc) . (2) 

Remember that we assume success after re-execution, so we do not account for second and more re-executions. 
Along the same line, we could spare the checkpoint after re-executing the last task in a series of tasks, but this unduly 
complicates the analysis. In Section]?] we show that this model with only a single re-execution is accurate up to second 
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order terms when compared to the model with an arbitrary number of failures that follows an Exponential distribution 
of parameter A. 

What is the expected energy consumed during execution? The energy consumed during the first execution at speed 
s is Ws'^ + Ec, where Ec is the energy consumed during a checkpoint. The energy consumed during the second 
execution at speed <t is Wa'^ + Ec, and this execution takes place with probability Pfaii = ATcxoc = X{W/s + Tc), 
as before. Hence the expectation of the energy consumed is 

E{E) = (Ws^+Ec) + X{W/s+Tc) {Wa^ + Ec) ■ (3) 

With multiple chunks (MultipleChunks model), the execution times (worst case or expected) are the sum of 
the execution times for each chunk, and the expected energy is the sum of the expected energy for each chunk (by 
linearity of expectations). 

We point out that the failure model is coherent with respect to chunking. Indeed, assume that a divisible task of 
weight W is split into two chunks of weights wi and W2 (where wi + W2 = W). Then the probability of failure for 
the first chunk is Pj^jj = \{wi/ s + Tc) and that for the second chunk is Pf^jj = X{w2/s + Tc). The probability 
of failure Pfaii = X{W/s + Tc) with a single chunk differs from the probability of failure with two chunks only 
because of the extra checkpoint that is taken; if Tc ~ 0, they coincide exactly. If Tc > 0, there is an additional risk 
to use two chunks, because the execution lasts longer by a duration Tc- Of course this is the price to pay for a shorter 
re-execution time in case of failure: Equation ([TJ shows that the expected re-execution time is PfaiiPrcoxoc, which is 
quadratic in W. There is a trade-off between having many small chunks (many Tc to pay, but small re-execution cost) 
and a few larger chunks (fewer Tc, but increased re-execution cost). 

3.2 Optimization problems 

The optimization problem is stated as follows: given a deadline D and a divisible task whose total computational load 
is W, the problem is to partition the task into n chunks of size Wi, where X)"=i = choose for each chunk 

an execution speed Si and a re-execution speed <Ti in order to minimize the expected energy consumption: 

E{E) = +Ec) + \(j^+ Tc) {w,af + Ec) , 

i=i ^ ^ 

subject to the constraint that the deadline is met either in expectation or in the worst case: 

Expected-Deadline E(T) = Yll^i (ff + + ^ (tT + ^c-) + ^c')) <D 
Hard-Deadline T^, = Y.l=i {^f +Tc + ^ +Tc) < D 

The unknowns are the number of chunks n, the sizes of these chunks Wi, the speeds for the first execution Si and the 
speeds for the second execution (7^. We consider two variants of the problem, depending upon re-execution speeds: 

• Singles PEED : in this simpler variant, the re-execution speed is always the same as the speed chosen for the 
first execution. We then have to determine a single speed for each chunk: cr,; = Si for all i. 

• Multiples PEEDS : in this more general variant, the re-execution speed is freely chosen, and there are two 
different speeds to determine for each chunk. 

We also consider the variant with a single chunk (SingleChunk), i.e., the task is atomic and we only need 
to decide for its execution speed (in the SINGLES PEED model), or for its execution and re-execution speeds (in the 
Multiples PEEDS model). We start the study in Section[5]with this simpler problem. 

4 Accuracy of the model 

In this section, we discuss the accuracy of this model, which accounts for a single re-execution. We compare the 
expressions of the expected deadline and energy (in Equations ([T]l and (|3]l) to those obtained when adopting the more 
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advanced model where an arbitrary number of Exponentially distributed failures can strike during execution and re- 
execution. We only deal with soft deadlines here, because no hard deadline can be enforced for the model with 
Exponentially distributed failures (the execution time of a chunk can be arbitrarily large, although such an event has 
low probability to occur). 

Assume that failures are distributed using an Exponential distribution of parameter A: the probability of failure 
during a time interval of length t is Pfaii = 1 — e"'*'*. Consider a single task of size W that we first execute at speed s. 
If we detect a transient failure at the end of the execution, we re-execute the task until success, using speed a at each 
of these new attempts. To the best of our knowledge, the expressions for E{T) and E{E) are unknown for this model, 
and we establish them below: 

Proposition 1. With an arbitrary number of Exponentially distributed failures and one single task of size W, 

E(T) W/s + Tc + e^(^/'"+^'^) (l - g-^^^/^+'^c)) {W/a + Tc) (4) 
E{E) = Ws^ +Ec + e^(^/-+^c) (^i _ ^-Hw/s+Tc)^ (^^2 ^ 

Proof. With an Exponential distribution. Equation ([T]i can be rewritten as E(T) = Tcxoc + ^faiiE(Ti.ccxcc), where 
Tcxcc — W/s + Tc and Pfaii = 1 — e^^'^^/'^^'^'^\ Since all re-executions are done at speed a, the expectation of the 
re-execution time obeys the following equation: 

E(T,.eexcc) = {W/a + Tc) + (1 - e-^(^/-+^c)^ E(T„excc) 

We use the memoryless property of the Exponential distribution here: after a failure, the expectation of the time to 
re-execute the task is exactly the same as before the failure This leads to E(Treoxoc) = e^'-^^''+^<^\W/a + Tc). 
Reporting in the first equation, we end up with Equation Q. The expression of the expected energy consumption 
(Equation (j5]l) is derived using the same line of reasoning. □ 

Proposition 2. With an arbitrary number of Exponentially distributed failures and one single task of size W, when 
A ^ 0, 

E(T) = {W/s + Tc) + X{W/s + Tc){W/a + Tc) + (A^) (6) 
E{E) = {Ws^ + Ec) + \{W/s + Tc){Wcj^ + Ec) O (A^) (7) 
Proof. The first-order Taylor expansion of a; 1— > around gives: 

E(T) ^ {W/s + Tc)+{l + \{W/s + Tc) + 0{\'^{W/s + Tc)^)) 

A— i-O 

X {\{W/cj + Tc) + {X^{W/a + Tc)^)) [W/a + Tc) 

Hence, 

E(T) = {W/s + Tc) + {\{W/s + Tc)+0{X^)){W/(j^Tc) 
E(T) = {W/s + Tc) + \{W/s + Tc){W/(j + Tc) + 0{\^) 

A— >-0 

Again, the energy formula is built using the same rationale. □ 

As a consequence of Proposition |2] the formulas that we consider with one single re-execution (Equations ([T]) 
and Q) are accurate up to second order terms when compared to the model with an arbitrary number of Exponential 
failures. Note that this result is not obvious, because we drop a potentially arbitrarily large number of re-executions in 
the linear model with at most one re-execution. Furthermore, the result extends naturally when considering a divisible 
task and MultipleChunks, since the result holds for each chunk, and by summation, one single re-execution of 
each chunk is accurate up to second order terms. 
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5 With a single chunk 

In this section, we consider the case of a single chunk, or equivalently of an atomic task; given a non-divisible 
workload W and a deadline D, find the values of s and a that minimize 

E{E) = {Ws^ +Ec)+x(^^+ Tc^ {Wa^ + Ec) 

subject to 

in the Expected-Deadline model, and subject to 

W W 

— +Tc + — +Tc<D 
s a 

in the Hard-Deadline model. We first deal with the SingleSpeed model, where we enforce a = s, before moving 
on to the MultipleSpeeds model. 

5.1 Single speed model 

In this section, we express E{E) as functions of the speed s. That is, E{E){s) = (W^s^ + Ec){l + X{W/s + Tc)). 
The following resuh is valid for both Expected-Deadline and Hard-Deadline models. 

Lemma 1. K{E) is convex on M^. It admits a unique minimum 

\W /-(3y3\/27a2 - 4a - 27a + 2)^/3 2^3 \ 



(8) 



6(l + ATc) \^ 21/3 (3y3\/27a2 -4a -27a + 2)1/3 j 

wherea^XEc[^^^y. 

Proof. Let us prove that g{s) = E{E){s) is convex and admits a unique minimum: we have g'{s) = s{2W{l + 
XTc)) + XW^ - g"{s) = {2W{1 + XTc)) + ^^^^^ > 0. This function is strictly convex in M* , and 

g' — > — oo, — > oo thus there exist a unique minimum. 

0+ oo 

Let us find the minimum. For s > 0, we have: 



Using a computer algebra software, it is easy to show that the minimum is obtained at the value s = s* given by 
Equation [8] □ 



2(1 + Arc)\ 2(1 + ATc) 



5.1.1 Expected deadline 

In the SingleSpeed Expected-Deadline model, we denote E(r)(s) = {W/s + Tc)(l + A(T4^/s + Tc)) the 
constraint on the execution time. 

Lemma 2. For any D, ifTc + XT^ > D, then there is no solution. Otherwise, the constraint on the execution time 



can be rewritten as s d 



c 

^^1 + 2XTc + V^XDTI 
2{D-Tc{l + XTc)) 
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Proof. The function s i— > E(T)(s) is strictly decreasing and converges to Tc + AT^. Hence, if Tc + AT^ > D, then 
there is no solution. Else there exist a minimum speed sq such that, E(T)(so) — D, and for all s > sq. 11^(2^) (■s) !i -D. 

More precisely, sq — W : since there is a unique solution to E(T)(s) — D, we can solve 

2{D - Tc{l + ATc)) 

this equation in order to find sq. □ 
To simplify the following results, we define 

l^f2A7cW4AU+T 
'^^"^ 2[D-Tc{l + XTc)) ■ 

Proposition 3. In the SINGLES PEED model, it is possible to numerically compute the optimal solution for SlN- 
GLEChunk as follows: 

1. IfTc + ATp > D, then there is no solution; 

2. Else, the optimal speed is max(so, s*). 

Proof. This is a corollary of Lemma [Tj because s i-> E(T)(s) is convex on M^, then its restriction to the interval 
[sq, +oo( is also convex and admits a unique minimum: 

• if s* < So, then E(r) (s) is increasing on [sq, +oo(, then the optimal solution is sq 

• else, clearly the minimum is reached when s = s*. 

The optimal solution is then max(so, s*). □ 



5.1.2 Hard deadline 

In the Hard-Deadline model, the bound on the execution time can be written as 2 + Tc) < 

Lemma 3. In the SingleSpeed Hard-Deadline model, for any D, if 2Tc > D, then there is no solution. 
Otherwise, the constraint on the execution time can be rewritten as s G p ^ ; +00 ( 

Proof. The constraint on the execution time is now 2 + Tc) < D. □ 

Proposition 4. Let s* be the solution indicated in Equation^ In the SINGLES PEED Hard-Deadline model if 
2Tc > D, then there is no solution. Otherwise, the minimum is reached when s = max (^s* , p^^ ^. 

Proof. The fact that there is no solution when 2Tc > D comes from Lemma [3] Otherwise, the result is obvious by 
convexity of the expected energy function. □ 

5.2 Multiple speeds model 

In this section, we consider the general MultipleSpeeds model. We use the following notations: 

E(£;)(s, a) = {Ws^ + Ec) + \{W/s + Tc){Wa^ + Ec) 
Let us first introduce a preliminary Lemma: 

Lemma 4 (Convexity SingleChunk). The problem of minimizing + ct^x^ under the constraint Ai + ^ < 
A2 where Aq, Ai, A2 are constants and ao,Q!i are positive constants is solved when x is minimum, that is when 
Ai + ^= A2. 

Proof. The function A^ + a^x^ is strictly increasing, so it is is minimized when x is minimum. The function + ^ 
is strictly decreasing with lima^_j.o — +00, hence an upper bound is reached when x is minimum. With those two 
results, we can say that the constraint should be tight in order to solve our problem. □ 
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5.2.1 Expected deadline 

The execution time in the MultipleSpeeds Expected-Deadline model can be written as 

E{T)(s, <j) = {W/s + Tc) + \{W/s + Tc){W/(j + Tc) 

We start by giving a useful property, namely that the deadline is always tight in the MultipleSpeeds Expected- 
Deadline model: 

Lemma 5. In the MultipleSpeeds Expected-Deadline model, in order to minimize the energy consumption, 
the deadline should be tight. 

Proof. Considering s and W fixed, then ¥.{T){s,a) = Tq + ^ < D, and E{E){s,a) = Eq + aa'^, where Tq = 
{W/s + Ec) + XTc{W/s + Tc), Eo = [Ws^ + Ec) + XEciW/s + Tc) and a = W{W/s + Tc) ai-e constant. 
With Lemma|4]we conclude that the deadline should be tight. □ 

This lemma allows us to express cr as a function of s: 

\W 

a = — . 

Also we reduce the bi-criteria problem to the minimization problem of the single-variable function: 



'W 

Ws' ^Ec + \\ — ^Tc 



s 



W . .rr. . \ +EC\ (10) 



V 



-(l + ATc) 



which can be solved numerically. 
5.2.2 Hard deadline 

In this model we have similar results as with Expected-Deadline. The constraint on the execution time writes: 
^ +Tc + ^ + Tc < D. Another corollai-y of Lemma|4]is: 

Lemma 6. In the MultipleSpeeds Expected-Deadline model, in order to minimize the energy consumption, 
the deadline should be tight. 

This lemma allows us to express cr as a function of s: 

W 



{D - 2Tc)s - W 

Finally, we reduce the bi-criteria problem to the minimization problem of the single-variable function: 

s^Ws^ + Ec + X^'^+Tc) [w{jJJ^^J^s)\ec^ (11) 
which can be solved numerically. 

6 Several chunks 

In this section, we deal with the general problem of a divisible task of size W that can be split into an arbitrary number 
of chunks. We divide the task into n chunks of size Wi such that ~ E^^h chunk is executed once at speed 

Si, and re-executed (if necessary) at speed <Ti. The problem is to find the values of n, wi, si and <Ji that minimize 

ns) = E {w,sl + Ec)+XY.{^^+Tc^ {w^al + Ec) 
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subject to 



in the Expected-Deadline model, and subject to 



in the Hard-Deadline model. We first deal with the SingleSpeed model, where we enforce cTj = Sj, before 
dealing with the MultipleSpeeds model. 

6.1 Single speed model 
6.1.1 Expected deadline 

In this section, we deal with the SingleSpeed Expected-Deadline model and consider that for aU i, o-j = Sj. 
Then: 

E(E){\Ji{wi, Si, Si)) = J2 {wisl + Ec) M + A r ^ + T( 

i ^ ^ 

Theorem 1. In the optimal solution to the problem with the SingleSpeed Expected-Deadline model, all n 
chunks are of equal size W/n and executed at the same speed s. 

Proof. Consider the optimal solution, and assume by contradiction that it includes two chunks wi and W2, executed 
at speeds si and S2, where either si ^ S2, or si = S2 and wi ^ W2. Let us assume without loss of generality that 

Sl — S2 ■ 

We show that we can find a strictly better solution where both chunks have size w = ^ {wi + W2), and are executed 
at same speed s (to be defined later). The size and speed of the other chunks are kept the same. We will show that the 
execution time of the new solution is not larger than in the optimal solution, while its energy consumption is strictly 
smaller, hence leading to the contradiction. 

We have seen that 

E{T){{w„s,), {W2, S2)) ='^+Tc+ — +Tc + x(—+Tc) + A f ^ + 
E{T){{w, s), {w, s)) = 2 + Tc) + 2A + Tc) ' 

Hence, 

EiT)iiu..,sMra2,S2))-EiT)iiu.,s),i..,^^^ 
Similarly, we know that: 

E{E){{wi,si), {W2, S2)) = wisj + Ec + W2sl +Ec + \ i^+Tc^ {w^sl+Ec) + A {w2sl+Ec) 
E{E){{w, s), (w, s)) = 2 [ws' + Ec) + 2A + Tc) [ws' + Ec) 
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and deduce 



E{E){{w,,si), {W2, S2)) - E{E){{w, s), {w, s)) 

= (wisl + W2sl - 2ws'^) (1 + XTc) + XEc ( ^ + ^ - — ) + A (wlsi + wls2 - 2w'^s) (12) 

\Si S2 S J ^ 



Let us now define 



2w wi + W2 

Sl ~'~ S2 Si S2 

V^W Wi + W2 

SB =- 



1/2 / . . 2 . . 2\ 1/2 



We then fix s = niax(syi, sb)- Then, since ,s > sa, we have 7^ + — ^ > 0, and since s > sb, we have 
(^) '+ (7)' > 0- This ensures that E(r)((«;i, si), {w2, S2)) - E(T)((«;, s), {w, s)) > 0. 



Note that 



•Sb V *i y V *2 y V *i ^2 y v ^2 

This means that sa > sb, hence s = sa- To prove that ¥.{E){{wi,si), {w2, S2)) ~ E{E){{w, s), {w, s)) > 0, we 
want to show that: 

1. wisl + W2sl - 2ws'^ > 

3. wlsi + w|s2 — 2w^s > 

4. and that one of the previous inequalities is strict. 

Note that by definition of s = sa, the second inequality is true. 

Let us first show that wisf + W2S2 — 2ws^ > 







/ Wi 










3 , 2 ( , ^WiW2 2 , 3 , 2 /^■S2 V , o^2Wi 2 / , \3 

= + tWiWo — + 2 WiSi +W2+ W2W-, — + 2 W2S2 ~ (^1 + W2) 

\S2j S1S2 \SiJ S1S2 

2 Al\ , 2 / ^2 

= w'iw'25'( — ) +w^'W2g\ — 

where g : u i-)- + ^ — 3. It is easy to show that g is nonnegative on R!^: indeed, Sf'(u) = ^ {u^ — 1) is negative in 
[0, 1 [ and positive in ]1, oo[, and the unique minimum is ^(1) = 0. We derive that wis\ + tt;2S2 ~ 2ws\ > 0. 
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Let us now show that wfsi + w|s2 — 2w'^s > Remember that 2w = wi 



W2- 




2,2 (w^i 
WjSl + W2S2 — 

' wi 
si 




2wl + 2wlw2 — 

S2 



Wi + W2 + W1W2 



2W2 + 2wiW2 



2f2 
Si 



2^-3 

S2 



(wi + W2) 

2^-3 

Sl 



Remember that we assumed without loss of generality that — > 




^2^-3 

S2 V Sl / 



>3w;.3 



Let us now conclude our study: if ^ 7^ 1, then the energy consumption of the optimal solution is strictly greater 
than the one from our solution which is a contradiction. Hence we must have si — S2, and wi 7^ W2 (in fact, since 
we assumed that ^ > we must have wi > u>2). Then we can refine the previous analysis, and obtain that 
wfsi + W2S2 — 2iiPs > 0: again, the optimal energy consumption is strictly greater than in our solution; this is the 
final contradiction and concludes the proof. □ 

Thanks to this result, we know that the problem with n chunks can be rewritten as follows: find s such that 



W 

ns 



Tc]+nX 



W 



in order to minimize 



W 



En + n\ 



W 
ns 



Tr 



Tr 



W 



w ^ \ fw 

— + nTc + - 
s n 



nTr 



< D 



E, 



c 



{W.i 



nE, 



c 



n \ s 



nTr 



One can see that this reduces to the SingleChunk problem with the SingleSpeed model (Section 5.1 1 up to 
the following parameter changes: 

n 

• Tc ^ nTc 

• Ec <~ nEc 

If the number of chunks n is given, we can express the minimum speed such that there is a solution with n chunks: 

1 + 2XTc - 



soin) = W 



\D 



1 



2{D - nTc{l + XTc)) 



(13) 



We can verify that when D < nTc{l + An), there is no solution, hence obtaining an upper bound on n. Therefore, 
the two variables problem (with unknowns n and s) can be solved numerically. 
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6.1.2 Hard deadline 



In the Hard-Deadline model, all results still hold, they are even easier to prove since we do not need to introduce 
a second speed. 

Theorem 2. In the optimal solution to the problem with the SingleSpeed Hard-Deadline model, all n chunks 
are of equal size W/n and executed at the same speed s. 

Proof. The proof is similar to the one of Theorem[T] except we do not need to study the case where sb > sa- □ 



6.2 Multiple speeds model 
6.2.1 Expected deadline 

In this section, we still deal with the problem of a divisible task of size W that we can split into an arbitrary number 
of chunks, but using the more general MultipleSpeeds model. We start by proving that all re-execution speeds are 
equal: 

Let us first introduce a preliminary Lemma: 

Lemma 7 (Convexity MultipleChunks). The problem of minimizing Ao + a^xf^ + aix\ under the constraint 
^ -\- ^ < Ai where Aq is a constant, and Ai^a^, ai are positive constants, is solved when Xq — Xi, and when the 
constraint is tisht: So _|_ ^ _ ^ 

Proof. First remark that when xi is fixed, then according to Lemma |4] the constraint should be tight. Hence this is 
true for the optimal solution (any optimal solution when the constraint is not tight can be improved by reducing one of 
the variables). 

To prove the result now that we know that the constraint is tight, it suffices to replace in the function we wish to 



2 



minimize, xq — . ""aj^ . Differentiating A^+aQY. ( . °^°c^ 1 +aix\ with respect to a; i gives ,3 +2aia:i. 

Then we obtain that the equation is minimized (by differentiating again, we can see that the function is convex) when 
2q!iQo 1_ 2(y^xi = 0, that is — xq + Xi = 0, hence the result. □ 



Note that if Ai is nonpositive, then there is no solution. 

Lemma 8. In the MultipleSpeeds model, all re-execution speeds are equal in the optimal solution: 3a, Vi, Ui — a, 
and the deadline is tight. 

Proof. This is a direct corollary of Lemma]?] If we consider the wi and si to be fixed, then we can write E(T) (Ui {wi ,Si,c 
Tq + J2i ™d E{E){Ui{wi, Si, ai)) = Eq + J^i aiU?, where Tq, Eq and a,; are constant. Assuming D — Tq > 
(otherwise there is no solution), we can apply LemmalTl then the problem is minimized when the deadline is tight, and 

b ' 



when for all i, Oi — „ ' ^ ■ n 

' D-Tn 



We can now redefine 

E(r)(Ui(wi, Si, (Ji)) = r(Ui(wi, s^),a) 
E(E){^i(wi,Si,ai)) = E(y}i{wi, Si),a) 

Theorem 3. In the MultipleSpeeds model, all chunks have the same size Wi — and are executed at the same 
speed s, in the optimal solution. 

Proof. We first prove that chunks are of equal size. Assume first, by contradiction, that the optimal solution has two 
chunks of different sizes, for instance wi < W2. These chunks are executed at speeds si and S2- Thanks to LemmajS] 
both chunks are re-executed at a same speed a. We consider the solution with two chunks of size w — ^{wi + ^2), 
executed at a same speed s (to be defined later), and re-executed at speed a (the value of the re-execution speed in 
the optimal solution). The size and speed of the other chunks are kept the same. We show that the execution time 
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is not greater than in the optimal solution, while the energy consumption is strictly smaller, hence leading to the 
contradiction. 

We have seen that 

E(T)(K, .0, (-2, ■s.U) ^'^+Tc + ^+Tc + x(^+Tc)(^+Tc)+x(^+Tc)(^+Tc) 

si S2 \si J ^ o- I \si ) \ a I 

E(r)((u;,s),(u;,s),a) = 2(^+rc) +2a(^+Tc) i^+Tc) 

Hence, 

E(T)(K,si),(«;2,S2),a)-E(T)(Ks),Ks),a) = (l + ATc)^''' ' ""^ ^""^ , ^ , < 

Similarly, we know that: 



Si Si SI (J \Si S2 



E(S)((u;i, Si), {W2, S2), a) = wis\ + Ec + wisj + Ec + \ (^"^ + Tcj {wia^ + Ec) + X + Tcj {w2<J^ + Ec 

E{E){{w, s), {w, s), a) = 2 {ws^ + Ec) + 2A + Tc) (w^ + Ec) 
and deduce 

E{E){{wi,si), {W2, S2),(J) - E{E){{w, s), {w, s), a) = 

(wis? + W2sl - 2ws^) + XEc — + — +ActM^ + ^ (14) 

V Sl S2 S / \Si S2 S J 

Let us now define 

2w Wi + W2 

Sl S2 Sl S2 

_ 2w^ _ 1 (Wl + W2f 

ii)JiU>| 2 _i_ ^ 

Sl ^ S2 Sl S2 

We then fix s — max(s^, sb)- Then, since s > sa, we have 7^ + — ^ > 0, and since s > sb, we have 

17 + 17 ~ ^ ^ 0- This ensures that E(r)((wi,si), (w2,S2),cr) ~ E{T){{w, s), {w, s), a) > 0. To prove that 
E(_E)((wi, Sl), (w2, S2), c) — E{E){{w, s), (w, s), ct) > 0, there remains to show that wis? + ii;2S2 ~ 2ws'^ > 0. 

Let us first suppose that sa > sb Then we have s = sa, and let us show that wisl + W2S2 ~ 2ws^ > 0: 
wi W2Y ( 2 , 2 ^ , X / 'Wi + ■u'2 

— + — j I WiSi + W252 - (Wl + W2) I 11,, _^ ^ 

^ 2 / r,'"'l''^2 2 ^ 2 / '*2\^ r,'"'2'Wi 2 / \3 

=ti;? + lUiUJj — + 2 W1S1 + W2 + W2W-, — + 2 W2S2 — {wi + W2) 

\S2j S1S2 \SiJ S1S2 

=u;iW2^ ( ' + 2| - 3] + ^^^2 ( ( J) ' + 2| - 3) 

2 /^5i\ 2 /^52 

=u;iW2m — +^1^23 — 

\S2j \Sl 

where g : u 1-^ + ^ ~ 3. We know from the proof of Theorem [l] that g is positive on M^, hence wisf + W2S2 ^ 
2ws^ > 0. 



Finally, since s > sb, we have ^ + ^ — > 0, and all other terms of E(£')((wi, si), (iy2, S2), ct) 



W2 _ 2111'' 

S2 S 

E{E){{'w, Sa), (w, Sa), o-) are non-negative, hence proving that the new solution is strictly better than the optimal 
one, and leading to a contradiction. 
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Let us now suppose that sa < ss Then we have s = sb- Moreover, we have {w2 ~ ^"i ) ( ~ ^7 ) < (this comes 
directly from < sg), and since we assume that W2 > wi, ^ — ^ < 0. Letus show that wisf+tz;2s| — 2«;Sg > 0: 





=iwl + Swlwl— + 4wiwi ( — ] + 4wl + Swlw^— + Awjw2 ( - (wi + 

S2 \S2j Si \SiJ 



Sl S2 



( 



WySi^W2S2-{wx^-W2) - 



l(}Vi+W2f 



\ 



S2 V •''■2 



2 ^ , ^ 

Sl S2 



-4«;^+8«;iu;? — 



3K+W2) +«;iW2 (8 — - 10) +wlwl ( 8 — - 10 



+W1W2 1 4 



S2 



-5 +m;Jw2 4 



Now because wi > we can bound the last equation. Let u = ^ (and hence wi> u x 1x12): 



Sl S2 



( 



Wxs\ + ^252 ~ ("'I + ^^2) 



V 



1 [wi + ^2)^ 

2 !il I 



>wl {vC' + 1) + (8u - 10) + V? - 10^ + u (4u2 - 5) + |^4^ - 5^^ 
=w\ (3«^ + - - + 3u + 3) 



=3w^ {u - \y {u + ly 
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2 

Si S2 



I 



2 -2 1 N / 1 ("^1 



\ 



2 



(u^ + 1) + (8u - 10) + ^ 




+u (4u2 - 5) ^4-^^-5 

= w;i (3u^ + - - + 3u + 3) 
= 3w| (u-l)^(u+l)^ 

Since W2 > Wi, < u < 1, and this polynomial is strictly positive, hence we have wisl + W2S2 — 2ws^ > 0. 

Finally, we can conclude that in both cases, E{E){{wi, si), {w2, S2), a')—E{E){{w, sb),{w,sb),<j) > 0, so there 
exist a better solution with two chunks of same sizes, hence leading to a contradiction. 

We had proven that all chunks have the same size. We use the same line of reasoning to prove that all chunks are 
executed at a same speed s. If there are two chunks executed at speeds si < S2 (with wi — W2 = w), then we have 
sa = sg. Considering that s = s^, it is easy to see that wisf+W2S2 — 2ws\ > since wiW2g(^^^ + wlw2g(^^^ > 
0. Indeed, g is null only in 1, and si 7^ S2. We exhibit a solution strictly better, hence showing a contradiction. This 
concludes the proof. □ 

Thanks to this result, we know that the n chunks problem can be rewritten as follows: find s such that 

• in order to minimize 

One can see that this reduces to the SingleChunk MULTIPLES FEEDS Expected-Deadline task problem 
where 

n 

• Tc -s- nTc 

• Ec <~ nEc 

and allows us to write the problem to solve as a two parameters function: 

{n,s)^Ws'^+nEc + ^{—+nTc^\w[ ^ ^ J + nEc \ (15) 

which can be minimized numerically. 



6.2.2 Hard deadline 

In this section, the constraint on the execution time can be written as: 

Lemma 9. In the MultipleSpeeds Hard-Deadline model with divisible chunk, the deadline should be tight. 

Proof. This result is obvious with Lemma |4] if we have a solution such that the deadline is not tight, if we fix every 
variable but di (the re-execution speed of the first task), we can improve the solution with a tight deadline. □ 
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Lemma 10. In the optimal solution, for all X + Tc^ af = X (^^r + '^c^ ^f- 

Proof. Consider any solution to our problem. Thanks to Lemma |9] we know that the deadline should be tight. Let 
Ti and Tj two tasks of er-execution speed ai,aj. We show that those speed can be optimally defined such that 

^{17+ ^c) <jf = A + Tc) al Let us call u, = A (f^ + Tc) and u, = A + Tc 



The minimization problem for those speeds can be written as Aq + UiWiaf + UjWjO-'j under the constraint that 
Ai + + ^ = D where neither Aq nor Ai depends on ai,aj. 

Replacing ai — ^ ™' in the function we need to minimize, we obtain Aq + UiWi ^ + UjWja'j. A 

simple differentiation gives —2wjUi-. . , 3 ^ + 2ujWjaj. Another differentiation shows the convexity of the 

3 



function we want to minimize. Hence one can see that the function is minimized when Ujcr^ ^ _d a 



□ 



Lemma 11. If we enforce the condition that the execution speeds of the chunks are all equal, and that the re-execution 
speeds of the chunks are all equal, then all chunks should have same size in the optimal solution. 

Proof. This result is obvious since the problem can be reformulated as the minimization of a'Y^Wi + (5 'Y^wf where 
neither a nor /3 depends on any Wi, under the constraints 7 + C 1^ D, and = W . It is easy to see the 

result when there are only two chunks since there is only one variable, and the problem generalizes well in the case of 
n chunks. □ 



We have not been able to prove a stronger result than Lemma 1 1 However we conjecture the following result: 



Conjecture 1. In the MultipleSpeeds Hard-Deadline, in the optimal solution, the re-execution speeds are 
identical, the deadline is tight. The re-execution speed is equal to cr = {D~-2r^c)s-w ^- Furthermore the chunks 
should have the same size — and should be executed at the same speed s. 



This conjecture reduces the problem to the SingleChunk MultipleSpeeds problem where 
. A^ ^ 

n 

• Tc ^ nTc 

• Ec *i— nEc 

and allows us to write the problem to solve as a two-parameter function: 

W ' ^ 




^n,s)^Ws^+nEc+'i[^+nTc]{W[ ^^_^^^^^^_^ s] + nEc ] (16) 



which can be solved numerically. 



7 Simulations 

7.1 Simulation settings 

We performed a large set of simulations in order to illustrate the differences between all the models studied in this 
paper, and to show to which extent each additional degree of freedom improves the results, i.e., allowing for multiple 
speeds instead of a single speed, or for multiple smaller chunks instead of a single large chunk. All these experiments 
are conducted under both constraint types, expected and hard deadlines. 

We envision reasonable settings by varying parameters within the following ranges: 
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Figure 1: Comparison with SingleChunk SingleSpeed. 
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Figure 2: Comparison with SingleChunk SingleSpeed. 
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Figure 3: Comparison Hard-Deadline versus Expected-Deadline. 
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• f e [0.2, 10] 

. ^ G [10-4,10-2] 

• Ece [10-3, 10^] 

• A e [10-8,1]. 

In addition, we set the deadline to 1. Note that since we study ^ and ^ instead of W and Tc, we do not need to 
study how the variation of the deadline impacts the simulation, this is already taken into account. 

We use the Maple software to solve numerically the different minimization problems. Results are showed from 
two perspectives: on the one hand (Figures [T] and |2]), for a given constraint (Hard-Deadline or EXPECTED- 
Deadline), we normalize all variants according to SingleSpeed SingleChunk, under the considered constraint. 
For instance, on the plots, the energy consumed by MultipleChunks MULTIPLES PEEDS (denoted as MCMS) 
for Hard-Deadline is divided by the energy consumed by SingleChunk SingleSpeed (denoted as SCSS) 
for Hard-Deadline, while the energy of MultipleChunks SingleSpeed (denoted as MCSS) for Expected- 
Deadline is normalized by the energy of SingleChunk SingleSpeed for Expected-Deadline. 

On the other hand (Figures |3] and |4|, we study the impact of the constraint hardness on the energy consumption. 
For each solution form (SingleSpeed or MultipleSpeeds, and SingleChunk or MultipleChunks), we plot 
the ratio energy consumed for Expected-Deadline over energy consumed for Hard-Deadline. 

Note that for each figure, we plot for each function different values that depend on the different values of Tq /D 
(hence the vertical intervals for points where Tc/D has an impact). In addition, the lower the value of Tc /D, the 
lower the energy consumption. 

7.2 Comparison with single speed 

At first, we observe that the results are identical for any value of W/D, up to a translation of Eq (see [W/D = 
0.2, Ec = 10-3) vs. {W/D ^ 5,Ec ^ 1000) on Figures [l] and |2) or see {W/D ^l,Ec ^ IQ-^) vs. {W/D = 
5, Ec — 0.1) on Figures [T| and |2] for instance). 

Then the next observation is that for Expected-Deadline, with a small A (< 10-^), MultipleChunks or 
MultipleSpeeds models do not improve the energy ratio. This is due to the fact that, in both expressions for energy 
and for execution time, the re-execution term is negligible relative to the execution one, since it has a weighting factor 
A. However, when A increases, if the energy of a checkpoint is small relative to the total work (which is the general 
case), we can see a huge improvement (between 25% and 75% energy saving) with MultipleChunks. 

On the contrary, as expected, for small A's, re-executing at a different speed has a huge impact for Hard- 
Deadline, where we can gain up to 75% energy when the failure rate is low. We can indeed run at around half speed 
during the first execution (leading to the 1 72^ = 25% saving), and at a high speed for the second one, because the very 
low failure probability avoids the explosion of the expected energy consumption. For both MultipleChunks and 
SingleChunk, this saving ratio increases with A (the energy consumed by the second execution cannot be neglected 
any more, and both executions need to be more balanced), the latter being more sensitive to A. But the former is the 
only configuration where Tc has a significant impact: its performance decreases with Tc \ still it remains strictly better 
than SingleChunk MultipleSpeeds. 

7.3 Comparison between Expected-Deadline and Hard-Deadline 

As before, the value of W/D does not change the energy ratios up to translations of Ec- As expected, the difference 
between the Expected-Deadline and Hard-Deadline models is very important for the SingleSpeed variant: 
when the energy of the re-execution is negligible (because of the failure rate parameter), it would be better to spend 
as little time as possible doing the re-execution in order to have a speed as slow as possible for the first execution, 
however we are Hmited in the SingleSpeed Hard-Deadline model by the fact that the re-execution time is fully 
taken into account (its speed is the same as the first execution, and there is no parameter A to render it negligible). 

Furthermore, when A is minimum, MultipleSpeeds consumes the same energy for Expected-Deadline and 
for Hard-Deadline. Indeed, as expected, the A in the energy function makes it possible for the re-execution speed 
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to be maximal: it has little impact on the energy, and it is optimal for the execution time; this way we can focus on 
slowing down the first execution of each chunk. For Hard-Deadline, we already run the first execution at half 
speed, thus we cannot save more energy, even considering Expected-Deadline instead. When A increases, speeds 
of Hard-Deadline cannot be lowered but the expected execution time decreases, making room for a downgrade of 
the speeds in the Expected-Deadline problems. 

8 Conclusion 

In this work, we have studied the energy consumption of a divisible computational workload on volatile platforms. In 
particular, we have studied the expected energy consumption under different deadUne constraints: a soft deadline (a 
deadline for the expected execution time), and a hard deadline (a deadUne for the worst case execution time). 

We have been able to show mathematically, for all cases but one, that when using the MultipleC hunks model, 
then (i) every chunk should be equally sized; (ii) every execution speed should be equal; and (iii) every re-execution 
speed should also be equal. This problem remains open in the MultipleSpeeds Hard-Deadline variant. 

Through a set of extensive simulations, we were able to show the following: (i) when the fault parameter A is 
small, for Expected-Deadline constraints, the SingleChunk SingleSpeed model leads to almost optimal 
energy consumption. This is not true for the Hard-Deadline model, which accounts equally for execution and re- 
execution, thereby leading to higher energy consumption. Therefore, for the Hard-Deadline model (hard deadline) 
and for small values of A, the model of choice should be the SingleChunk MultipleSpeeds model, and that is 
not intuitive. When the fault parameter rate A increases, using a single chunk is no longer energy-efficient, and one 
should focus on the MultipleChunks MultipleSpeeds model for both deadline types. 

An interesting direction for future work is to extend this study to the case of an application workflow: instead of 
dealing with a single divisible task, we would deal with a DAG of tasks, that could be either divisible (checkpoints 
can take place anytime) or atomic (checkpoints can only take place at the end of the execution of some tasks). Again, 
we can envision both soft or hard constraints on the execution time, and we can keep the same model with a single re- 
execution per chunk/task, at the same speed or possibly at a different speed. Deriving complexity results and heuristics 
to solve this difficult problem is likely to be very challenging, but could have a dramatic impact to reduce the energy 
consumption of many scientific applications. 
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